Kishore Papineni

mentions 1 type Person feed RSS

// recent coverage 1 mentions

18:36

2026-06-26

dev.to

large-language-models

How We Actually Measure Whether an LLM's Output Is Good - BLEU, COMET and BLEURT

Shrijith Venkatramana, building git-lrc, explains the evolution of LLM evaluation metrics from BLEU to BLEURT and COMET. BLEU, introduced in 2002, measures n-gram overlap and correlates with human jud…

// co-occurs with top 7 entities

Shrijith Venkatramana 1 git-lrc 1 IBM 1 Google Research 1 BLEU 1 BLEURT 1 COMET 1